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DEFECT INSPECTION METHOD 



BACKGROUND OF THE INVENTION 

The present invention relates to exterior 
inspection for detecting defects of patterns being 
examined, and particularly to a defect inspection method 
and apparatus for inspecting patterns in a semiconductor 
wafer or liquid crystal display. 

In a conventional inspection apparatus of this 
kind, as disclosed in JP-A-55-74409, an image sensor 
such as a line sensor is used to detect the image of a 
pattern being examined while the pattern is being moved, 
and the detected image signal is compared in its 
gradation with another image signal delayed by a 
predetermined time, so that the inconsistency in the 
comparison can be recognized as a defect. 
15 In addition, in another example disclosed in 

JP-2B-8-10463, two images are arranged in a row and 
compared with each other. 

The above conventional defect recognition 
methods will be described in detail with reference to 
20 Figs. 1, 2, 3 and 4. Fig. 1 is a schematic diagram of 
memory mats and peripheral circuits in a memory chip of 
the pattern being inspected in the prior art. Fig. 2 is 
a histogram of the brightness of the memory mats and 
peripheral circuits of the memory chip shown in Fig. 1, 
25 Fig. 3 is a schematic diagram of a pattern being 

examined which pattern is processed to be flat by CMP 
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(chemical mechanical) . 

A semiconductor wafer has formed thereon a 
large number of memory chips 20 one of which is 
illustrated in Fig. 1. The memory chip 20 can be 
5 divided roughly into memory mats 21 and peripheral 

circuits 22. Each of the memory mats 21 is a group of 
small repetitive patterns (cells), and the peripheral 
circuits 22 are fundamentally a group of random 
patterns. In most cases, if each memory mat is observed 

10 in detail, it can be recognized as a group of a plural- 
ity of patters repeated at different cell pitches. 

Fig. 2 illustrates the distribution of the 
brightness of the memory mats 21 and peripheral circuits 
22 in Fig. 1, or the frequency (histogram) with respect 

15 to the brightness of a memory chip expressed by ten 

bits, or in 1024 gradations, maximum. The memory mats 
21 have a high pattern density and are generally dark. 
The peripheral circuits 22 have a low pattern density 
and are generally bright. 

20 In the flattening process such as CMP shown in 

Fig. 3, the circuit pattern within the memory mat 21 
changes the brightness with the pattern thickness as 
will be understood from the histogram of Fig. 4. This 
figure shows that the wiring layers are deposited and 

25 then flattened by CMP. In this pattern, the film 

thickness locally changes, easily causing irregular 
brightness. In the case of such a pattern, the 
brightness values on the pattern shown in Figs. 2 and 3 
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are compared- If a threshold is set not to erroneously 
detect the brightness difference, the sensitivity to 
defect detection is extremely reduced. This brightness 
difference can be cancelled out to some extent if a wide 
5 wavelength band is used for illumination. However, 
because the pattern after CMP has sometimes a great 
change in brightness, there is a limit. Therefore, it 
has been desired to devise means for detecting minute 
defects from a pattern having irregular brightness . 

10 Also, in a conventional example, the sum of 

the squares of the differences between corresponding 
parts of two pictures is calculated and applied to a 
paraboloid so that a positional shift between the 
pictures can be detected. This method, however, does 

15 not assure that the two images to be compared are 

coincident. Thus, optimum matching has been desired for 
the comparison. Fig. 5 shows experimental results of 
calculating the sum of the squares of the differences of 
opposite pixels of two pictures (f(x, y) in Fig. 13 in 

20 the later description) of which one picture is shifted 
by ±1 pixel in the x and y directions. The abscissa 
indicates the x direction, and the ordinate the y 
direction. Each value illustrated in the figure is the 
sum of the squares of the differences. Here, the same 

25 pictures {f(x, y) in Fig. 13) are used. That is, Z 

(f(x, y)-f(x±l, y±l))2 is calculated as the sum of the 
squares of the differences. From Fig. 5 it will be seen 
that the sums of the squares of the differences even 
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between the same pictures are not symmetrical with 
respect to the center (0, 0) , or have an asymmetry of 
about 0.6%, Since the same pictures one of which is 
shifted are used, the sum of the squares of the 
5 differences is 0 at the point (0, 0) . Therefore, even 
if the position where the sum of the squares of the 
differences is the minimum is calculated with a 
resolution of pixel size or below by applying a 
paraboloid to this data, a correct positional shift, or 

10 (0, 0) here cannot be detected. 

Also, brightness is changed on the wafer after 
the flattening process such as CMP. The effect of this 
brightness change is illustrated in Fig, 6. Here, two 
pictures are used one of which has 1.1 times the 

15 brightness of the other. The brightness 1.1 times 

higher corresponds to the usual brightness change on the 
CMP wafer or below. Each value in the experimental 
results of Fig. 6 is the sum of the absolute values of 
the differences. The position where the minimum value 

20 is located is (0, 1) . Thus there is a great error in 

terms of pixel level contrary to the resolution of pixel 
or below. The sum of the squares of the differences has 
the same tendency. From these data, it will be under- 
stood that the positional shift between pictures cannot 

25 be found precisely. Of course, for the brightness 1.05 
times higher there is the same tendency. Thus, applying 
a paraboloid to the sum of the squares of the differ- 
ences and calculating the position where the minimum 



value is obtained must be said to be means having very 
large error. 

SUMMARY OF THE INVENTION 

Accordingly, it is an object of the invention 
to provide a pattern defect inspection method and 
apparatus with the above problems solved, and capable of 
examining by comparing patterns of different brightness 
so that defects can be inspected with high sensitivity 
and high reliability at all times. 

In addition, it is another object of the 
invention to provide a pattern defect inspection method 
and apparatus using a high-precision picture matching 
process . 

Moreover, it is still another object of the 
invention to provide a pattern defect inspection method 
and apparatus capable of detecting with high sensitivity 
even for a wafer pattern after CMP. 

In order to achieve the above objects, accord- 
ing to the invention, there is provided a method of 
inspecting defects of a plurality of patterns formed to 
be naturally the same on a substrate, wherein a first 
pattern being inspected is detected as a first image 
which is then stored, a second pattern being inspected 
is detected as a second image, and the second image is 
matched in brightness to the first image stored, and 
then compared with the first image so that the patterns 
can be inspected. 



Moreover, according to the invention, there is 
provided a method of inspecting defects of a plurality 
of patterns formed to have naturally the same shape and 
flattened in their surfaces, wherein a first pattern 
being inspected is optically picked up as a first image 
signal and stored, a second pattern being inspected is 
optically picked up as a second image signal, at least 
one of the first image signal stored and the second 
image signal is locally changed in gradation, and the 
first and second image signals are compared so that the 
patterns can be inspected. 

In addition, according to the invention, there 
is provided a method of inspecting defects of a plural- 
ity of patterns formed to be naturally the same on a 
substrate, wherein a first pattern being inspected is 
detected as a first image and stored, a second pattern 
being inspected is detected as a second image, the first 
image stored and the second image are corrected for 
their positional shift with an accuracy of pixel unit, 
the brightness of one or both of the corrected first and 
second images is changed, the first and second images 
changed in brightness as above are compared so that the 
inconsistency between the first and second images is 
detected as a defect, and the detected result is 
displayed. 

Thus, according to the invention, the 
certainty of inconsistent information can be judged by 
using a scatter diagram of two detected images to be 



compared. In addition, since defects are detected by 
using information from the scatter diagram, the 
inspection can be made highly reliable. Moreover, use 
of the scatter diagram makes it possible to decide an 
appropriate threshold. Also, by using the certainty of 
inconsistent information, it is possible to effectively 
make defect review. 

Therefore, reliable inspection data can be 
used by adding reliability. Furthermore, defects can be 
detected with high sensitivity without reducing the 
total inspection sensitivity by the brightness 
difference due to the change of the film thickness of a 
multilayer pattern. Therefore, in the manufacturing 
process of semiconductor devices, defects of patterns of 
a wafer after CMP can be detected with high precision 
and high reliability. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a schematic of the memory mats and 
peripheral circuits in a memory chip of which the 
pattern is to be inspected. 

Fig. 2 is a histogram of brightness in the 
memory mats and peripheral circuits of the memory chip. 

Fig. 3 is a diagram to which reference is made 
in explaining the flow of CMP. 

Fig. 4 is a histogram of brightness in the 
memory mats and peripheral circuits of a different 
memory chip after CMP. 
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Fig. 5 is a diagram showing the sum of the 
squares of the differences between two pictures. 

Fig. 6 is a diagram showing the sum of the 
absolute values of the differences between two pictures. 
5 Figs. 7 and 8 are block diagrams of pattern 

defect inspection apparatus according to one embodiment 
of the invention. 

Fig. 9 is a detailed block diagram of an image 
brightness coincidence filter operation unit 12 in Figs. 
10 7 and 8. 

Fig. 10 shows an example of a twin filter. 

Fig. 11 is a diagram to which reference is 
made in explaining the operation of the image brightness 
coincidence filter operation unit 12. 
15 Fig. 12 is a detailed block diagram of a local 

gradation converter 13. 

Figs. 13A-13C show examples of detected images 
and difference image according to the invention. 

Figs. 14A-14B, 15A-15B, 16A-16B, 17A-17B and 
20 18A-18B show examples of the gradation conversion 
according to the invention. 

Figs. 19 and 20 are diagrams to which 
reference is made in explaining threshold setting 
systems . 

25 Figs. 21-23 are scatter diagrams of local 

contrast at each image processing step on two pictures 
being compared. 

Fig. 24 is a block diagram of a pattern defect 
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inspection apparatus according to another embodiment of 
the invention - 

Fig. 25 is a block diagram of a threshold 
computation circuit 48. 
5 Fig. 26 is a block diagram of a pattern defect 

inspection apparatus according to another embodiment of 
the invention. 

Fig. 27 is a diagram to which reference is 
made in explaining the scatter diagram production 24 and 
10 display 25 according to the embodiment of the invention. 

Figs. 28 and 29 are diagrams showing the 
results at each image processing on two pictures being 
compared. 

Figs. 30-32 are scatter diagrams at each image 
15 processing step on two images being compared. 

Figs. 33-37 are examples of scatter diagrams. 

Fig. 38 is a partially cross-sectional block 
diagram of a pattern defect inspection apparatus 
according to still another embodiment of the invention. 
20 Fig. 39 is a diagram showing the scatter 

diagram production and display according to the 
embodiment of the invention. 

Figs. 40A and 40B are diagrams to which 
reference is made in explaining the local gradation 
25 conversion according to the embodiment of the invention. 

Fig. 41 is a block diagram to which reference 
is made in explaining the scatter diagram production and 
display according to the embodiment of the invention. 
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Fig. 42 is a diagram showing the results at 
each image processing step on two pictures being 
compared. 

Figs. 43A and 43B are scatter diagrams. 
5 Figs. 44A-44C show examples of output lists 

for defects - 

Fig. 45 is a diagram to which reference is 
made in explaining the amount of positional shift 
between pictures. 
10 Fig. 46 is a diagram to which reference is 

made in explaining spectrum analysis. 

DESCRIPTION OF THE EMBODIMENTS 

Some embodiments of the invention will be 
described with reference to the accompanying drawings. 
15 [Embodiment 1] 

Figs. 7 and 8 are block diagrams of pattern 
defect inspection apparatus according to the first 
embodiment of the invention. 

It is assumed that this embodiment inspects, 
20 for example, patterns of a semiconductor wafer. 

Referring to Figs. 7 and 8, there are shown an 
image sensor 1 that responds to the brightness or 
gradation of light reflected from a semiconductor wafer 
4 of patterns being inspected to produce a gradation 
25 image signal, an A/D converter 2 for converting the 

gradation image signal from the image sensor 1 into a 
digital image signal 9, a delay memory 3 for delaying 



r 
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the gradation image signal, and the semiconductor wafer 
4 having patterns being inspected. There are also shown 
a stage 5 that is moved in X-direction, Y-direction, Z- 
direction and 9-direction (rotation) with the semi- 
5 conductor wafer 4 placed thereon, an object lens 6 

facing the semiconductor wafer 4, a light source 7 for 
illuminating the semiconductor wafer 4 of the patterns 
being inspected, a half mirror 8 for reflecting the 
illumination light and supplying it through the object 

10 lens 6 to the semiconductor wafer 4 and at the same time 
allowing the reflected light from the semiconductor 
wafer 4 to permeate therethrough, and the digital image 
signal 9 into which the gradation image signal is 
converted by the A/D converter 2. Thus the light from 

15 the light source 7 for illumination is reflected to 

provide, for example, bright field illumination on the 
semiconductor wafer 4 through the object lens 6. 

The delay memory 3 may be a delay memory for 
storing and delaying image signal 9 of a one-cell pitch 

20 or plurality-of-cells pitch repeated or may be another 
delay memory for storing and delaying image signal 9 of 
a one-chip pitch or plurality-of-chips repeated- 

In addition, a block 11 is used to align the 
digital image signal 9 and a delayed digital image 

25 signal 10, or here to detect the amount of shift at 

which the minimum gradation difference can be obtained 
with a pr ecision o f pixel unit, and shift one picture on 
the basis of this amount of shift so as to align the two 



- 12 - 

pictures. Here, the images are continuously detected by 
the image sensor, but divided at, for example, each 256 
lines (the number of lines is determined according to 
the method described later) , and the images of this unit 
are aligned. A block 12 is a brightness converter for 
converting both image signals that are different in 
brightness so that the brightness of one image signal 
equals to that of the other. Here, all the images are 
passed through a filter at a time so that the brightness 
of one image coincides with that of the other. 

A block 13 is a gradation converter for 
converting the gradations of both image signals that are 
different in brightness so that the brightness of one 
image can be coincident with that of the other. Here, 
linear conversion is performed for each pixel by gain 
and offset so that the brightness coincidence can be 
achieved. The image signals from the gradation 
converter 13 are compared by a comparator 14, and the 
inconsistency can be detected as a defect. 

The detected image signal is serially 
processed by a pipeline-type image processing system, 
and finally a defect and its features are produced. 

Although bright field illumination is employed 
in the above example, the light source is not limited 
thereto, but may be an arbitrary one if it can be used 
as microscope illumination such as dark field 
illumination or ring band illumination. The illumina- 
tion by an electron beam can be of course used. 
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The inspection may be performed a plurality of 
times with these illumination conditions changed so that 
the logical sum of the results from the plurality of 
inspection operations can be employed as the final 
5 result. Alternatively, it is possible that the logical 
product thereof is employed to assure the defect and 
that process diagnosis may be made by, for example, the 
distribution of defects or number of defects. In this 
case, the review for visual observation of inconsistent 

10 portions is not necessary, and thus the operation can be 
simplified and facilitated. 

The operation of the inspection apparatus 
constructed as above will be described with reference to 
Figs. 7-12. The order of processes in Fig. 7 is 

15 different from that in Fig. 8. 

Referring to Figs. 7 and 8, the stage 5 is 
moved at a constant speed in the X-direction so that the 
illumination light focused by the object lens 6 scans 
the necessary region of the patterns of semiconductor 

20 wafer 4 being inspected, while the image sensor 1 

detects the brightness information (gradation image 
signal) of the pattern formed on the semiconductor wafer 
4, or of the memory mats 21 and peripheral circuits 22 
within the chip 20. 

25 After the completion of one-row movement, the 

stage 4 suddenly moves with high speed to the next row 
in the Y-direction and positions itself. In other 
words, uniform movement and fast movement are repeated 
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for the inspection- Of course, step and repeat type 
inspection may be performed. Then, the A/D converter 2 
converts the output (gradation image signal) from the 
image sensor 1 into the digital image signal. This 
5 digital image signal 9 has a format of 10 bits. 

Although the image processing can be well performed 
without particular problem even if the signal has about 
6 bits, a certain number of bits larger than that is 
necessary for the detection of minute defects. 

10 First the pixel-unit alignment between images 

I will be mentioned. In this alignment, one of two 

pictures to be compared is shifted pixel by pixel from 
the other while the gradation difference (the difference 
between each pixel of one picture and the corresponding 

115 pixel of the other) is calculated, and the amount of 

shift at which the gradation difference is the minimum 
is found. The range of shift between pictures to be 
detected is set, for example, within ± 3 pixels, maximum 
but changed according to the design rule of pattern. 

20 Thus, the two pictures are aligned by shifting one 
picture by the obtained amount of shift. 



A method for the alignment will be described 



below . 



S(Ax, Ay)-Z|f(x, y)-g(x-Ax, y-Ay) | 



(1) 



25 



The shift detection is performed by detecting 



Ax, Ay when the above S (Ax, Ay) becomes the minimum. 



However, since the position satisfying the 



minimum is obtained only when the picture is shifted 
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pixel by pixel, this position is added with an offset 
depending on whether the true position is near to Ax or 



According to the expressions given below. Ax 
5 and/or Ay are added with 1 or nothing, that is, 

if S{1, 0)+S(l, -1)+S(0, -1) is the minimum, then Ax++ 

... (2) 

if S(-l, 0)+S(-l, -1)+S(0, -1) is the minimum, then 
nothing ■ . - (3) 

10 if S(-l, 0)+S(-l, -1)4-3(0, 1) is the minimum, then Ay++ 

... (4) 

and if S(-l, 0)+S(l, 1)+S(0, 1) is the minimum, Ax++, 
Ay++ . . . ( 5 ) 

where Axh-+ means Ax=Ax+l. 

15 Thus, two pictures can be always aligned by 

shifting one picture by the obtained amount of shift. 
In other words, a picture f is always shifted to the 
upper right to be a new picture f ' . The movement 
direction can be limited to one of four directions 

20 (lower right, upper left, lower left and upper right) . 
This leads to the simplification of hardware. 



brightness coincidence filter operation unit 12. First, 
filters F, F' are found that make the following 
25 expression the minimum within two pictures f(x, y) , g(x, 
y) that are aligned with accuracy of pixel unit. 



The filters F, F' have a size of for example 2x2 pixels. 



Ay. 



Fig. 9 is a detailed block diagram of the 



2:(F*f(x, y)-F'*g(x, y))^ 



(6) 



Fig. 10 shows examples of filters. The 
filters F and F' are symmetrical, and a twin as 
illustrated. If the filters are of the twin type, the 
coefficients of the filter parameters can be solved by 
using the method of least squares, 

a = I (SSCO^Cy) ^ (SZCx^Cy) - (ZSCO^Cx) * (SZCy*Cy) I / 

I (SSCx^Cx) * (S2Cy*Cy) - (2ECx*Cy) * (ZSCx^Cy) | ... (7) 

p = I (SSCO^Cx) * (SSCx^Cy) - (SZCO^Cy) * (ZSCx^Cx) I / ^"'^ 

I (SSCx^Cx) * (SSCy^Cy) - (SLCx^Cy) * (ZECx^Cy) | ... (8) 



where 



CO = f (x,y) -g(x,y) 



(9) 



Cx = If (x+l,y)-f (x,y) I - I g (x-1, y ) -g (x, y ) | 



(10) 



Cy = If (x,y+l)-f (x,y) |-|g(x,y-l)-g(x,y) I ...(11) 

This system filters the two pictures and make 
the square error of the gradation the minimum to reach 
coincidence. No repetitive computations are necessary, 
or a single calculation is made to achieve the object. 

The feature of this system is that the filter 
coefficients a, p are found so that the gradations of 
two pictures can be well coincident in terms of square 
error minimum. Particularly, these parameters do not 
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necessarily indicate the true amount of shift of 
picture. For example, as described about the prior art 
it can be considered to apply a paraboloid to S (Ax, Ay), 
calculate the minimum gradation difference position, and 
5 then find interpolating pixels by interpolation on the 
basis of this calculated position. In this case, there 
is no rule or conditions to be met for the brightness, 
and thus it is not guaranteed to use the obtained 
pictures for the comparative inspection. In addition, 
10 under a different brightness, it is not clear what the 
computed shift shows. In addition, even if the minimum 
gradation difference position calculated approximately 
to a paraboloid is coincident with that obtained 
according to the system used in this embodiment, the 
15 produced pictures to be compared are not coincident. 

The proposed matching system assures that the 
difference between the squares of the brightness values 
of two pictures becomes the minimum. Thus, in this 
point this system is different from the other systems. 
20 As illustrated in Fig. 11, because of linear approxima- 
tion, the coefficient a of filter has error for a 
positional shift. However, the obtained brightness 
values are coincident. This system can substantially 
reduce the gradation difference between images, and thus 
25 1 it is much appropriate for the comparative and 
yinspection. 

Moreover, the filter coefficients a, p can be 
calculated analytically without repetitive computation. 
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10 



15 



and thus this system is suitable to be formed as certain 
hardware . 



local gradation converter 13. The two pictures f(x, y) , 
g (x, y) that are aligned with accuracy of pixel unit and 
produced from the brightness coincidence filter 
operation unit are processed so that parameters a, b (a: 
gain, b: offset) can be produced which make the 
following expression the minimum within a certain area 
of the pictures. 

i;(f(x, y)-a*g(x, y)-b)' (12) 

The parameters a, b can be calculated by partially 
differentiating the above expression with respect to a, 
b and making the resulting expression equal to zero. 
For example, the certain area is a range of 7 around 
each point . 

The g(x, y) as one of the image signals is 
converted by use of the obtained parameters into 



Thus, pictures coincident in bright can be obtained. 
The parameters a, b can take different values for each 
position (x, y) . 



a= (E(f (x,y) g (x,y) ) -Sf (x,y)Sg(x,y) /MN) / (Sg(x, y) g (x,y) - 



Fig. 12 is a detailed block diagram of the 



a*g(x, y)+b 



(13) 



2g(x,y)2:g(x,y) /MN) 



. . . (14) 



b=(Zf(x,y)-a 2g(x,y))/MN 



. (15) 
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where MN is the number of pixels in the range of S. 

In addition, within the rang of E, the 
brightness of the aimed center pixel is compared with 
that of the surrounding pixels. If the brightness 
5 values of those pixels are greatly different, it will be 
better not to add those values. 

Alternatively, the addition itself is made, 
but it will be effective to weight the values before the 
addition, thereby lowering percent contribution. For 
10 example, if the brightness of the aimed pixel at (x, y) 
is represented by c, and that of another pixel within 
the range of S by d, then the weight (x, y) can be 
expressed by 

W(x, y)=max[l-(c-d) V(D*D), 0] (16) 

15 where max [ ] is the maximum value detection, the 

brightness c, d is of 8 bits gradation, and D is a 
constant. 

Thus, if the brightness of the aimed center 
pixel is similar to that of the surrounding pixels, the 

20 weight is selected to be substantially equal to 1 . If 
it is not similar, the weight is smaller than 1. 
Although D is a constant, it may be changed according to 
the brightness, or D=func (c) . Moreover, decision is 
made of whether or not the pixel belongs to the same 

25 pattern. If the average brightness of different 

patterns is represented by n, D may be given by D=|c-|a|. 
If there are three or more different patterns, D may be 
selected to be the difference between similar patterns. 
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Of course, it is not necessary to stick to this form. 
Other means may be used if weights are properly 
provided. 

Figs- 13A and 13B show examples of two 
5 detected images. The two detected images f(x, y) , g(x, 
y) have different brightness as illustrated. The two 
images were aligned with precision of pixel unit, and 
subjected to the brightness coincidence filter 
operation. However, since these images have an 

10 excessively large difference in brightness, a great 
inconsistency is caused in the difference image as 
illustrated in Fig. 13C. This image was subjected to 
the gradation conversion process. 

Figs. 14A-'16B show examples of the process. 

15 That is. Figs. 14A'-16A and 14B-16B illustrate two 

detected images g(x, y) , f(x, y) , converted image a*g(x, 
y)+b, and their brightness histograms, respectively. 
Here, D was selected to be 70, or D=70. 

As will be understood from the histogram shown 

20 in Fig. 14B, the value D corresponds to the difference 
between the average brightness values of the two 
distributions of the double hump response histogram. In 
other words, the weight W with this D serves as an index 
for indicating whether or not the brightness belongs to 

25 the same distribution. The decided area is the range of 
7x7 pixels around each point. From Figs. 14A'-'16B, it 
will be seen that the brightness histograms are made 
substantially equal by the conversion. Here, after the 
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experiment on the images shown in Figs. 14A-16A, the 
parameters a, b of a=1.41, b=0 were obtained at certain 
points within the images. In addition, it will be 
understood that the brightness gains in the images are 
5 greatly different (41%). 

From the above example, it can be considered 
that the offset b is always fixed to 0, and that the 
gain is made variable. The offset and gain may be 
determined according to the characteristics of patterns 

10 to be considered and apparatus structure. 

Figs. 17A, 17B and 18A, 18B show the differ- 
ences between the images obtained by the conversion- In 
the first three images of Figs. 17A, 17B and 18A, 18B, 
the decided areas are the ranges 3x3, 5x5, 7x7 around 

15 each point. At this time, the weight is equal to 1, or 
W(x, y)=l. In addition, in the last image, the decided 
area is the range 7x7, and the weight depends on the 
above-mentioned W(x, y) . From these figures, it will be 
seen that when the area is small, the brightness values 

20 are locally added and that the inconsistency between 
images becomes small. The allowance of brightness is 
extended, but minute defects will be missed. Therefore, 
it is necessary to spread the area according to the 
defects being detected. However, if the weight is fixed 

25 to 1, the boundary between the patterns will be detected 
as inconsistency, or false report. If weighting is 
made, the effect of the boundary is reduced, two images 
are substantially equal in brightness, and a minute 
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defect can be detected. 

The area such as 7x7 pixels is not necessarily 
square, but may be a rectangle, polygon or circle. The 
area is not limited to such very small regions, but may 
be a region as large as (hundreds of pixels) x (hundreds 
of pixels). In short, the area may be within a range in 
which the brightness variation can be absorbed. 

The weight can also be selected to be 0 when 
the brightness difference between the aimed center pixel 
and the peripheral pixels is larger than a threshold. 

In addition, the following gradation 
conversion can be considered. 

W(x, y) (Qf/ag) (g(x, y) -m^) -^m, (17) 

where a^, and m^, mg are the standard deviation and 
average value within a certain area near a point (x, y) 
in the image f (x, y) , g(x, y) , respectively. 

By the above conversion, it is possible to 
make the brightness of the image g(x, y) coincident with 
that of the image f(x, y) . 

The weight W(x, y) may be the above values or 
correlation coefficients of image data within a certain 
area in the images f(x, y) and g(x, y) , 

This system has a feature that the histograms 
of two images eventually coincide with each other. 

Either system takes a linear conversion form 
of gain and offset. 

The above-mentioned gradation conversion is 
the local brightness conversion in the vicinity of the 
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aimed pixel. Of course, the gradation conversion may be 
applied to the whole image, or here to all the 256 lines 
according to the object and image characteristics. In 
addition, when the brightness of one of two images is 
5 made coincident with that of the other, the brightness 
of a brighter image can be decided to use as a reference 
by calculating, for example, the average brightness 
values of each two images, and comparing them, or by 
calculating the average brightness values of each 

10 certain areas or points. 

Although the gradation conversion is executed 
after the image brightness coincidence filter operation 
as in Fig. 7, this order may be reversed as in Fig. 8. 

The comparator 14 may be the means shown in 

15 the system developed by the inventors and disclosed in 
JP-A-61-212708 . This comparator is formed of a 
difference image detector, an inconsistency detector for 
converting the difference image into a binary signal on 
the basis of a threshold, and a feature extraction 

20 circuit for calculating an area, a length (projection 
length), coordinates and so on from the binary output. 

The selection of a threshold for use in the 
conversion to binary values according to the invention 
will be further described with reference to Figs. 19 and 

25 20. 

When a difference image is converted into a 
binary signal, false report is easy to occur at the 
boundary between regions as described above. Thus, as 
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illustrated in Fig. 19, the detected image is processed 
at each point to produce by computation a difference 
between the maximum and the minimum, an average value 
and a larger one of the differentiated values of x, y 
5 (hereinafter, referred to as local data) within a local 
region- These produced values are multiplied by 
separately determined parameters, and added, or 
subjected to the so-called multiplication addition 
calculation, thereby generating a threshold- According- 

10 ly, since the differentiated values increase at, for 
example, the boundary between regions where the 
brightness change is large, the threshold increases, 
thus preventing the false report from being caused. Of 
course, it is not necessary to provide all the three 

15 values of the difference between the maximum and the 
minimum, the average value and the large one of the 
differentiated values of x, y, but only one may be 
produced- For example, if the gradation conversion is 
performed, the average value is not necessary to 

2 0 compute. 

If the difference between images is converted 
into a binary signal by using the threshold, the false 
report problem can be effectively reduced. The local 
data can be obtained more easily by finding distribu- 
25 tions from the scatter diagram described later. Figs. 
21-^23 show scatter diagrams of the difference between 
the maximum and the minimum within a local region of 
images. A line segment is drawn on this distribution 



data, and error from the line segment is found. This 
process is executed for each local data, and then a 
threshold can be determined by the multiplication and 
addition . 

For example, it is assumed that the threshold 
Th is calculated from the following equation. 
Th=C3x (local contrast ) +C2x ( average brightness), 
where the local contrast image is defined by the maximum 
minus the minimum of 3x3 pixels, and the average 
brightness image is expressed by the moving average of 
3x3 pixels. 

The two local contrast images to be compared 
are represented by f (x, y) , g(x, y) , and Ve calculated 
from 

. . . (18) 



is made equal to ak. 

Similarly, the brightness average images are 
represented by f(x, y) , g(x, y) , and the calculated Ve 
is made equal to aa. 

Thus, the following equation (19) can be 
determined. 

ag=C3xak+C2xaa (19) 

The same is done for another image. Thus, 
coefficients C2, C3 can be found. 
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In order to solve the above equation of Th, 
the standard deviation ak is determined which is the 
distance from a straight line of gradient 1 (m=l), 
interception 0 (n=0) to each plot data point in the 
5 local contrast scatter diagram and which corresponds to 
error. Similarly, the standard deviation aa is found 
which is the distance from a straight line of gradient 
1, interception 0 to each plot data point in the scatter 
diagram of average brightness, and which corresponds to 

10 error. In addition, the standard deviation ag is 

estimated which is the distance from a straight line of 
gradient 1, interception 0 to each plot data point in . 
the brightness scatter diagram of the two original 
images, and which corresponds to error. 

15 These values are substituted into the above 

equation Th, giving rise to an equation having C2 and C3 
like the equation (19) . This operation is performed for 
images at different points, thus producing other 
equations of different coefficients C2 and C3. These 

20 equations are solved as simultaneous equations, so that 
coefficients C2, C3 are definitely determined. Thus, 
the threshold Th can be calculated from the above 
equation with known C2, C3. Of course, the threshold Th 
may be given by 

25 Th=C3x (local contrast ) +C2x (average brightness ) +off set . 

For another setting system, the floating 
threshold to be estimated may be given by the following 
equation (20) that is a linear connection of local 
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brightness contrast and average values. The parameters 
are calculated by multiple regression analysis with 
reference to the scatter diagram information of two 
pictures being compared. 

5 Th = CO+Clx| f-g|+C2x|T|+C3x| f ' |+C4x|T| ...(20) 
The procedure for the setting will be given below. 

(1) Detect images at a plurality of points (a set of 
two chips) . 

(2) Generate a brightness scatter diagram from data of 
10 detected image and reference image (using images not 

including defects or images with defects removed) . 

(3) Find points enveloping a set of data in the scatter 
diagram (extract a point of frequency 1 in estimation) , 
and extract local contrast and average data from the 

15 pixels of image corresponding to the points. 

(4) Adjust the parameters CO^CA by multiple regression 
analysis on the basis of the information obtained by the 
step (3) , 

(5) Select data to be used according to p value 

20 (significance level) (find a combination in which the p 
value is a much reliable value (0.05 or below)). 

(6) Calculate threshold images from the estimated 
parameters C0~C4, and compare with difference images. 

(7) Add false report if present, and adjust the 
25 parameters C0-'C4 . 

(8) Make a test inspection. 

(9) Repeat the steps (7) and (8) if a false report 
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occurs. 

In addition, as shown in Fig. 20, look-up 
tables (LUTs) may be used in place of the multiplication 
addition operation of coefficients and error mentioned 
5 above. As illustrated in Figs, 19 and 20, the detected 
image is processed to produce local maximum values and 
local minimum values, and the contrast of the difference 
therebetween, and then fed to the LUT. Similarly, the 
detected image is processed to produce a local average 

10 value, and fed to the LUT. The outputs from these LUTs 
are supplied to another LUT, thereby producing a 
threshold. The circuit arrangements shown in Figs. 19 
and 20 limit the number of bits being used to 8->6 in 
order for the scale of the LUTs to be appropriate. The 

15 estimated threshold is supplied to the comparator (Figs. 
7 and 8) 14, where it is used as a threshold for the 
conversion to a binary signal- The data of the contents 
of the LUTs are produced by using various images which 
are processed by the same procedure as above to produce 

20 error which is then interpolated. 

The images to be selected are of course in the 
place where error is easy to detect. The prior art does 
not use this way of deciding- The feature of the 
present invention is not only the establishment of the 

25 procedure but also theoretical decision. 

Referring to Figs. 7 and 8, input means 15 
formed of a keyboard, a disk or the like supplies to a 
CPU 16 coordinates of array data within the chips on the 



semiconductor wafer 4 which are obtained from the design 
information. The CPU 16 generates defect inspection 
data on the basis of the inputted coordinates, and 
supplies it to a memory 17. 

This defect inspection data can be indicated 
on display means such as a display or supplied to the 
outside from the output means. 

In addition, the operator can visually confirm 
that the gradation conversion is properly made for 
inspection by displaying the image before the gradation 
conversion or image data and image after the gradation 
conversion or image data or by displaying image after 
the gradation conversion or image data. 

Thus, images can be compared with high 
precision, and the object of the invention can be 
achieved with high sensitivity. 

While this embodiment employs bright field 
illumination, the images obtained by dark field 
illumination can be used for the inspection. Also, the 
kinds of defects can include defective shapes such as 
short-circuits or open-circuits or other foreign bodies. 
[Embodiment 2] 

Fig. 24 shows the second embodiment of a 
pattern inspection method and apparatus according to the 
invention. In this embodiment, an electron beam is used 
to scan the sample and the electrons generated from the 
wafer by the irradiation of the electron beam are 
detected. An electron beam image of the scanned region 
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is thus obtained on the basis of the change of the 
intensity, and used to make a pattern inspection. The 
second embodiment overcomes the problems to be solved by 
the invention by setting a defect decision threshold for 
5 each pixel considering pattern shift and different 
gradations . 

This system includes a detection unit 101, an 
image extractor 102, an image processor 103, a whole 
controller 104 for controlling the whole system. 
10 The detection unit 101 will be described 

first . 

Referring to Fig. 24, an electron beam emitted 
from an electron gun 31 passes through a magnetic field 
lens 32 and an object lens 33 and focused on the sample 

15 surface to an extent of about pixel size in diameter. 
In this case, a negative potential is applied to the 
sample by a ground electrode 37 and a retarding 
electrode 38 to decelerate the electron beam between the 
object lens and the sample, thereby achieving high 

20 resolution in the low-acceleration voltage region. When 
the electron beam is irradiated on the sample, the 
sample (wafer 1) emits electrons, A deflector 34 
deflects the electron beam so that the electron beam 
repeatedly scans the sample in the X-direction, and at 

25 the same time the sample is continuously moved in the Y- 
direction with the stage 2. The sample generates 
electrons in synchronism with the repetitive X-direction 
scanning and the continuous Y-direction movement, thus 
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producing a two-dimensional electron beam image of the 
sample. The electrons emitted from the sample are 
caught by a detector 35, and the signal is amplified by 
an amplifier 36. 
5 In this system, it is desired that a fast- 

deflection static deflector be used for the deflector 34 
for permitting the electron beam to repeatedly scan in 
the X-direction, that a thermal field emission type 
electron gun that can emit a large electron beam current 
10 and thus reduce the irradiation time be used as the 
electron gun 31, and that a semiconductor detector 
capable of fast driving be used for the detector 35. 

The image extractor 102 will be described 

next . 

15 The amplified signal from the amplifier 36 is 

converted into a digital signal by an A/D converter 39, 
and fed to a pre-processor 40. The pre-processor makes 
the input signal be subjected to dark level correction 
(the dark level is the average of the gradations of 

20 particular pixels during the beam blanking period) ,~ 

electron-beam-current fluctuation correction (the beam 
current is detected by an object diaphragm not shown and 
the signal is normalized by the beam current), and 
shading correction (correction for the variation of 

25 light intensity due to beam scan position) , Thereafter, 
in the pre-processor, the signal is subjected to 
filtering process by a Gaussian filter, an averaging 
filter or a edge emphasizing filter so that the picture 
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quality can be improved. If necessary, image distortion 
is corrected. This pre-processing is made for the 
detected image to be converted favorably to the later 
defect decision processing. 
5 A delay circuit 41 delays the signal by a 

constant time. If the delay time is made equal to the 
time in which the stage Jif is moved by one-chip pitch, 

A 

the delayed signal gO and the non-delayed signal fO 
become the image signals at the same locations of the 

10 adjacent chips, and thus can be used for the previously 
mentioned chip comparative inspection. Alternatively, 
if the delay time is set to correspond to the time in 
which the stage 5 is moved by the pitch of memory cell, 
the delayed signal gO and the non-delayed signal fO 

15 become the image signals at the same locations of the 
adjacent memory cells, and thus can be used for the 
previously mentioned cell comparative inspection. 

Thus, the image extractor 102 produces the 
image signals fO and gO being compared. Hereinafter, fO 

20 is referred to as the detected image, and gO as the 
compared image. 

The image processor 103 will be described. 
A pixel-unit aligner 42 shifts the compared 
image so that the location at which the "degree of 

25 matching" between the detected image as a reference and 
the compared image is the maximum lies within O-'l pixel. 

Then, the filters F, F' in the brightness 
coincidence filter operation unit are determined to make 
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the brightness inconsistency between the images the 
minimum. As described above, it is necessary to 
estimate various different statistics ZSxx in order to 
solve the equations (7), (8) for the parameter 
5 coefficients dxO, dyO of filters by the method of least 
squares- A statistics calculator 44 computes various 
statistics ZSxx, and a sub-CPU 45 receives the 
statistics and calculates a, p from the equations (7), 
(8) . 

10 A local gradation converter 46 makes gradation 

conversion, permitting the above-mentioned fl and gl to 
coincide in brightness. 

A difference extractor 49 estimates a 
difference image sub(x, y) between fl and gl. That is, 

15 the following equation is satisfied. 

sub(x, y)=gl(x, y) -gl (x, y) (21) 

A threshold calculator 48 receives the image 
signals fl, gl produced from the local gradation 
converter 4 6 and a, p, and computes two thresholds 

20 thH(x, y) and thL(x, y) by which decision is made if the 
difference image sub(x, y) has a defect. The threshold 
thH(x, y) regulates the upper limit of the sub(x, y) , 
and the threshold thL(x, y) does the lower limit of the 
sub(x, y) . Fig. 25 shows the arrangement of the 

25 threshold calculator 48. The equations for the 

calculation in the threshold calculator will be given 
below. 



thH(x,y) 
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= A(x,y)+B(x,y)+C(x,y) 



. (22) 



thL(x,y) = A(x,y)-B(x,y)-C(x,y) 



(23) 



in which 



A(x,y) = I dxl (x, y) *a-dx2 (x, y) * (-a)l + 
ldyl(x,y)*p-dy2(x,y)*(-P)| 
= I dxl (x, y) +dx2 (X, y)| *a + 
Idyl (x,y)+dy2 (x,y)| *p 



(24) 



B(x,y) = I ldxl(x,y) *aa-dx2 (x,y) * (-aa)l I 
I Idyl(x,y)*pp-dy2{x,y)*(-pp)| | 
= I ldxl(x, y) +dx2 (x, y)| *aal + 
I Idyl (x,y)+dy2 (x,y)| *pp| 



. (25) 



C(x,y) = (maxl + max2)/2*y+e 



(26) 



where aa, bb are real numbers of 0-0.5, y is a real 
number larger than 0, and s is an integer larger than 0. 



dxl(x,y) = f 1 (x+1, y) -f l(x, y) 
dx2(x,y) = gl (X, y) -gl(x-l, y) 
dyl(x,y) = fl (x,y+l) -fl(x,y) 
dy2(x,y) = gl (x, y) -gl (x, y-1) 

maxl=max| f 1 (x, y) , f 1 (x+1, y) , f 1 (x, y+1 ) , f (x+1, y+l)| 
max2=max| gl (x, y) , gl (x-l, y) , gl (x, y-1) ,g(x-l,y-l)| 



> 



(27) 




- 35 - 

The first term A(x, y) of the right side of 
equations (22), (23) for the calculation of thresholds 
is provided for correcting the threshold in accordance 
with a, p estimated by the shift detector 43. For 
5 example, dxl (x, y) expressed by equation (27) is 

regarded as a local rate of change in the x-direction of 
the gradation of fl, and dxl (x, y)(a is a prediction 
value of change of the gradation of fl shifted by a. 
Thus, the first term, {dxl(x, y) *a-dx2 (x, y) *(-a)} of 

10 A(x, y) is a prediction value of how the gradation of 
the difference image between fl and gl is changed for 
each pixel when the images fl and gl are shifted a, and- 
a in the x-direction, respectively. Similarly, the 
second term is a prediction value in the y-direction. 

15 The first term A(x, y) of the threshold is provided for 
canceling a, p. 

The second term B(x, y) of the right side of 
equations (22), (23) for the calculation of thresholds 
is provided for allowing very small shift of pattern 

20 edge, minute difference of pattern shape and pattern 
distortion. When the equation (24) for A(x, y) and 
equation (25) for B(x, y) are compared, it will be 
understood that B(x, y) is the absolute value of the 
prediction of gradation change of the difference image 

25 with aa, bb. If the known shift (regarded) is cancelled 
by A(x, y) , the addition of B(x, y) to A(x, y) means the 
shifting (regarded) of the aligned state by aa in the x- 
direction and by bb in the y-direction. That is, B{x, 
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y) allows shifting aa in the x-direction and bb in the 
y-direction . 

The subtraction of B(x, y) from A(x, y) means 
the shifting of the aligned state by -aa in the x- 
5 direction and -bb in the y-direction. -B{x, y) allows 
shifting -aa in the x-direction and -bb in the y- 
direction. Provision of upper and lower thresholds 
results in allowing the shift of ±aa, ±bb. The 
allowance of shift can be controlled freely by setting 

10 the parameters aa, bb at proper values. 

The third term C(x, y) of equations (22), (23) 
for the calculation of thresholds is provided for 
allowing the very small difference between gradations . 
The addition of C(x, y) means allowing that the 

15 gradation of gl is C(x, y) larger than that of fl. The 
subtraction of C(x, y) means allowing that the gradation 
of gl is C(x, y) smaller than that of fl. Although C(x, 
y) in this embodiment is expressed by the sum of a 
typical gradation (here the maximum) in a local region, 

20 multiplied by a proportional constant y and a constant e, 
it is not necessary to be limited to this function, but 
may be a function suitable for a known way of gradation 
change, if present. If it is known that the variation 
width is proportional to the square root of gradation, 

25 C(x, y) = (maxl+max2) l/2*y+8 should be used in place of the 
equation (26). As in B(x, y) , the gradation difference 
allowance can be controlled freely by parameters y, 8. 

A defect decision circuit 50 receives the 
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output sub(x, y) from the difference extractor 49, and 
the outputs thL(x, y) , thH(x, y) from the threshold 
calculator 48, and decides if the following expression 
is satisfied. 

5 thL(x, y)<sub(x, y)<thH(x, y) (28) 

That is, if the above condition is satisfied, the pixel 
at (x, y) is decided not to be defective. If it is not 
satisfied, the pixel at (x, y) is decided to be 
defective. The defect decision circuit 50 thus produces 
10 a def (x, y) of 0 for the non-defective pixel or 1 or 
above for the defective pixel . 

A feature extractor 50a makes noise removal 
process (for example, reduces/expands the def (x, y) ) , 
thereby eliminating noise output, and then makes merging 
15 process for the neighboring defective pixels. There- 
after, it calculates amounts of various features such as 
the center-of-mass coordinates, XY projection length and 
area for each lump. 

The whole controller 104 converts the 
20 coordinates of the defective part into a coordinate 

system on the sample, thereby removing false defects, 
and finally collects defect data formed of position and 
amounts of features on the sample . 

The defect data can be displayed or produced 
25 through the output means in the same way as in the 
embodiment 1 . 

In addition, the image before gradation 
conversion or image data and the image after gradation 



conversion or image data are displayed or the image 
after gradation conversion or image data are displayed 
so that the operator can visually confirm that the 
gradation conversion is properly made for inspection. 

According to this embodiment, since the total 
shift of a small region, very small shift of each 
pattern edge and a minute gradation difference can be 
allowed, a correct part can be prevented from being 
recognized as defect by mistake. Moreover, the 
allowance of shift and gradation change can be easily 
controlled by parameters aa, bb, y and e, 
[ Embodiment 3 ] 

Fig. 2 6 shows the third embodiment of a 
pattern defect inspection method and apparatus according 
to the invention. Referring to Fig. 26, in which like 
elements corresponding to those in Figs. 7 and 8 are 
provided, there are shown the image sensor 1 for 
producing a gradation image signal according to the 
brightness, or gradation of the reflected light from the 
semiconductor wafer 4 that has patterns being inspected, 
the A/D converter 2 for converting the gradation image 
signal from the image sensor 1 into the digital image 
signal 9, the delay memory 3 for delaying the gradation 
image signal, the semiconductor wafer 4 having the 
patterns being inspected, and the stage 5 on which the 
semiconductor wafer 4 of the patterns being inspected is 
placed and which is moved in the X-direction, Y- 
direction, Z-direction and 0-direction (rotation) . In 




- 39 - 

addition, there are shown the object lens 6 placed 
facing to the semiconductor wafer 4, the light source 7 
for illuminating the semiconductor wafer 4 of the 
patterns being inspected, the half mirror 8 for 
5 reflecting the illumination light to permit the light to 
pass through the object lens 6 and illuminate the 
semiconductor wafer 4, and at the same time allowing the 
reflected light from the semiconductor wafer 4 to 
transmit therethrough, and the digital image signal 9 

10 produced from the A/D converter. 

Thus, the illumination light from the light 
source 7 is reflected and passed through the object lens 
6 to illuminate the semiconductor wafer 4, or making, 
for example, bright filed illumination to the wafer. 

15 The delay memory 3 may be a memory for storing 

and delaying a pitch of one cell or a plurality of cells 
repeated, of the image signal 9 or may be a delay memory 
for storing and delaying a pitch of one chip or a 
plurality of chips repeated, of the image signal 9. 

20 The block 11 is used to align the digital 

image signal 9 and the delayed digital image signal 10. 
In this embodiment, it detects the amount of shift at 
which the gradation difference between pixels is the 
minimum by normalization correlation, and causes one 

25 image to shift on the basis of this amount of shift so 
that the two images can be aligned. The normalization 
is made in order to reduce the effect of the brightness 
difference between the images being aligned. 
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In other words, the stored image g(x, y) is 
shifted relative to the detected image f(x, y) , and the 
position at which the correlation value becomes the 
maximum is estimated from the following equations. 



i?(Ax, Ay) "^^ iglx+Ax,y^Ay) -griAx, Ay) } , _ (29) 



jf-o y-0 y/fo-go (Ax, Ay) 



^ = -T^EE^^^'y'' ...(30) 

x=0 y=0 



_ x-i r-1 

g{Ax,Ay) = gr(x+Ax, y+Ay) ...(31) 

-^■^ x^O y=0 



f^' ^T,Y,^f(x,y)--I)^ ...(32) 

x=0 y=0 



go (Ax, Ay) 5^{sr(x+Ax,y+Ay) -?(Ax, Ay)}2 ... (33) 

x-O y^O 



Here, although the image is continuously 
detected by the image sensor, the detected image is 
divided into lines as will be described later, and the 
alignment is performed for line units. In the above 
equations, the detected image has a size of XxY pixels. 
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Although not shown, the normalization 
correlation for use in finding the image shift need not 
be made for all image, but may be performed for, for 
example, small information-carrying images of K small 
5 parts (size of X/KxY pixels) into which a picture is 
divided in the longitudinal direction of the image 
sensor . 

The decision of whether there is information 
is made by, for example, differentiating each small 

10 image to detect the presence or absence of an edge, and 
selecting a small image having many edges. If the image 
sensor is a linear image sensor of multi-tap structure 
capable of parallel outputs, the image from each tap 
output corresponds to the small image. This idea is 

15 based on the fact that the images from the parallel 
outputs have an equal shift. In addition, the image 
sensor used here may be an TDI, CCD image sensor of time 
delay integration type. 

The gradation converter 13 converts the 

20 gradations of both image signals having a different 

brightness in order to make the brightness values equal. 
Here, linear conversion is performed for each pixel by 
gain and offset to achieve the brightness matching. 

dx dy 

J2 E ^^^'y'<^'Cly)-^fix,y)-a{x,y)-g(x.y)-b(x,y)}^ 

. . . (34) 

W(x,y,dx,dy)=max[l-(f (x,y)-g(x+dx,y+ciy) )2/d2, O] ...(35) 
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a(x,y) = 



d» try 



<tr ay 

— — — ^ • J3 5^ 



(tr ay 

^ix,y,dx,ciy^•f{x,y)'Y, £ 
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(36) 



i?(x,y) 



dx dy dx dy 

E (^(^.y.cbf, dy) (x,y) ) -a(x,y) • 52 i^ix, dx, dy) -gix. y) ) 

dx y-dy x'-dx y-dy 



£ 2 ffU.y.dx.dy) 

x»-dx y-dy 
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The converter 12 coverts both image signals 
having a different brightness in order to make the 
brightness values coincident. In this embodiment, 
filtering operation is performed for all images to 

5 achieve the brightness matching. 

The produced image signals are compared by the 
comparator 14. An inconsistency, if present, is 
detected as a defect. 

An image input unit 23 receives two images 

0 being compared. The input images are supplied to a 
scatter diagram generator 24, which then produces a 
scatter diagram. The scatter diagram shows the 
brightness values of the two images on the ordinate and 
abscissa. The display 25 indicates the produced scatter 

5 diagram. The input means 15 inputs, for example, a 
threshold for the binary conversion of the absolute 
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value of a difference image, and plots a line segment of 
the inputted threshold on the scatter diagram. Thus, 
whether the input threshold is appropriate or not can be 
decided easily by observing this scatter diagram. Also, 
5 with reference to the displayed scatter diagram, it is 
possible to determine a threshold suitable for the 
images. One example of the scatter diagram will be 
shown in Fig. 33. 

When W(x, y, dx, dy)=l, the following 
10 equations can be satisfied. 



. . . (38) 



idx dy dx dy 
£ (^(^/y) -a(x,y) ) • ]^ (cfix.y)) 
= -dx y=-dy x=-dx y=-dy ^ 

(2dx+l) •(2c?y+l) 



•(39) 
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In addition, a line segment is applied to the 
plotted data group on the scatter diagram by means of 
the method of least squares, and error can be found as 
the deviation from this line segment. 

If a straight line is expressed by Y=m'f (x, 
y)+n, the least squares (m, n) can be linearly 
approximated by the following equations. 

cbc dy dx dy 

dx dy 



dx dy 



E E ^^^-y) 

^dxy^dy (2dx4-i) •(2dy+l) 



...(40) 

n = 'gTxTyT - m-UxTyT ...(41) 



The error from the straight line is estimated 
from, for example, the following equations. 



■ ■ ■ (■12) 



''^° (2dx.l).(2dx.l)-2 A3,S/^"'-^'-'""^"'-^'""'>' 



...(43) 
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The threshold is calculated on the basis of 
this error, and can be plotted on the scatter diagram. 
For example, the threshold is a value proportional to 
the square root of this Ve. Fig. 27 illustrates an 
5 example of the structure for this. 

A statistics calculator 26 makes the appli- 
cation to the line segment and calculation of error from 
the segment. A threshold calculator 27 computes a 
threshold from the produced statistics. Of course, an 
10 arrangement may be provided by which the user can input 
a threshold. 

The images to be used on the scatter diagram 
are two images being compared, for example, images of 
pixel units after alignment. At each step of the image 
15 processing, two images can be supplied to the image 
input unit 23. 

Figs. 2 8 and 2 9 show examples of two images 
processed according to the system illustrated in Fig. 
26. A pattern of lines and spaces is detected on the 
20 lower right region of the images. The upper left region 
has no pattern. Figs. 28 and 29 also show histograms of 
images in the course of each process, and statistics of 
different image. From the histograms, it will be seen 
that the brightness values of two images are not 
25 coincident at the first step. 

First, a correlation value is estimated from 
the normalization correlation, the position at which 
this correlation value is high is found, and alignment 
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is performed with an accuracy of pixel unit. Then, the 
two images aligned are subjected to local brightness 
correction that is local gradation conversion. Finally, 
filtering is made to permit the two images to coincide 
5 in brightness, thereby further increasing the degree of 
coincidence in the image brightness. 

Figs. 30-32 show scatter diagrams of images at 
each step of process. Since the two images are not 
coincident in brightness at the stage where the images 

10 are aligned with an accuracy of pixel units, the values 
scatter out of the straight line of 45-degrees gradient 
on the scatter diagrams. However, after the local 
gradation conversion, or local brightness correction, 
and filtering process according to the invention, the 

15 values are distributed around the straight line on the 
scatter diagram. Thus, from the scatter diagrams it 
will be understood that there is an effect of making the 
brightness values of the two images uniform. The 
gradient and intercept in the figures are the gradient 

20 and intercept of a line segment fitted to the data of 
scatter diagrams. 

The gradient as the scale for the degree of 
coincidence between the two images was first 0.705, 
changed to 0.986 after the local gradation conversion, 

25 or local brightness correction, and arrived at 0.991 
after the filtering process. Thus, it will be 
understood that the degree of coincidence between 
brightness values is improved. 
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Moreover, the value of Ve indicating the 
degree of coincidence between the two images was first 
40.02, changed to 8.598 after the local gradation 
conversion, or local brightness correction, and reached 
5 7,477 after the filtering process. Thus, the degree of 
brightness coincidence is increased. The Ve value is 
not of all image, but is, for example, a linearly 
approximated error Ve of each region of 7x7 pixels 
including the surroundings of each pixel as illustrated 

10 in Figs. 30-32. From the images, where the brightness 
matching error is large will be seen. 

Figs. 21-23 show scatter diagrams of local 
contrast of images. In this embodiment, the contrast is 
the difference between the maximum and minimum of the 

15 surroundings of each pixel, or for example 3x3 pixels. 
The local contrast after the local gradation conversion 
and filtering process according to the invention is 
distributed scatting near the straight line on the 
scatter diagrams. The gradient and intercept have the 

20 same meaning as in the previously given diagrams. The 
images of Ve values are of linearly approximated Ve for 
a region of 7x7 pixels including the surroundings of 
each pixel in the local contrast image. 

Figs. 33-36 show examples of scatter diagrams 

25 and thresholds. In Fig. 33, since two images are 
different, the threshold is set to be large for 
preventing the erroneous detection of the images. Fig. 
34 is a scatter diagram after the local gradation 
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conversion, or brightness correction according to the 
invention. Since the degree of coincidence between the 
two images is high, the set threshold is small. Fig. 35 
is a scatter diagram after the brightness coincidence. 
5 The threshold is further reduced. Fig. 36 is a scatter 
diagram after the linear gradation conversion of one 
image for image unit not each pixel unit. The threshold 
has an offset on the scatter diagram. 

Fig. 37 shows an example of divisional linear 

10 gradation conversion for image unit. In this example, 
two divisions are shown. 

The scatter diagram and threshold can be 
widely used for the standard to determine a defect 
detection sensitivity or for the confirmation of if the 

15 established threshold is appropriate. 

The generation and display of these scatter 
diagrams or the calculation of threshold using data of 
the scatter diagrams can be performed by using images 
detected before the start of inspection. In addition, 

20 it will be clear that if the generation of scatter 

diagrams and threshold setting are carried out for each 
image in synchronism with the image detection, the 
inspection can be conducted with high sensitivity. The 
image detection may be made after the completion of the 

25 respective processes. While the image process is 
achieved by the pipeline type process as described 
above, it may be made by another arrangement. 



[Embodiment 4] 

Fig. 38 illustrates the fourth embodiment of a 
pattern defect inspection method and apparatus according 
to the invention. 

The construction shown in Fig. 38 is the same 
as that of Fig. 26 except for the image brightness 
coincidence filter 12. In Fig. 38, like elements 
corresponding to those in Fig. 26 are identified by the 
same reference numerals. 

The operation of the arrangement shown in Fig. 
38 is the same as in the third embodiment in that the 
image sensor 1 generates a gradation image signal 
according to the brightness of the reflected light from 
the semiconductor wafer 4 of patterns being inspected, 
and that the local gradation converter 13 makes linear 
conversion by gain and offset for each pixel, thereby 
achieving brightness coincidence. 

In this embodiment, the comparator 14 compares 
the image signals produced from the local gradation 
converter 13, thereby detecting an inconsistency as a 
defect. The detected image signal undergoes constant 
sequential processes of pipeline type, and finally the 
defect and its features are produced. 

The operation of the inspection apparatus 
having the above construction will be described below. 

Referring to Fig. 38, the illumination light 
focused by the object lens 6 scans the stage 5 in the X 
direction (for example, in the direction perpendicular 
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to the array direction of sensor chips on the sensor 
surface of the on-dimensional image sensor 1) while the 
stage 5 is being moved at a uniform speed so that a 
necessary region of the semiconductor wafer 4 having 
5 patterns being inspected can be scanned by the 

illumination light- Consequently, the image sensor 1 
detects the brightness information (gradation image 
signal) of the memory mats 21 and peripheral circuits 22 
within the pattern formed on the semiconductor wafer 4, 

10 or within the chip 20. 

When the stage completes the movement of one 
row, it fast moves in the Y-direction (perpendicular to 
the X-direction) to reach the start point of the next 
row. In other words, while the image sensor 1 detects 

15 the image of the pattern formed on the semiconductor 

wafer 4, the stage 5 repeats the uniform movement along 
a row and fast movement for the start of the next row. 
Of course, the step and repeat type inspection may be 
employed. 

20 The A/D converter 2 converts the output 

(gradation image signal) from the image sensor 1 into 
the digital image signal 9. This digital image signal 9 
is of 10 bits. Of course, if it has about 6 bits, it 
can be well processed without problem. However, in 

25 order to detect a very small defect, the number of bits 
is required to be large to some extent. Thus, here a 
ten-bit format is used for somewhat margin. 

Referring to Fig. 38, the coordinates of array 
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data within the chip on the semiconductor wafer 4 that 
are obtained on the basis of the design information are 
inputted by the input means formed of a keyboard or 
disk. The CPU 16 generates defect inspection data 
5 according to the inputted coordinates of the array data 
within the chip on the semiconductor wafer 4, and causes 
it to be stored in the memory 17, The defect inspection 
data stored has also data of defect reliability added 
indicating the certainty of defect which will be 

10 described later. 

This defect inspection data, if necessary, can 
be displayed on display means or printed out by output 
means such as a printer together with the defect 
reliability. The defect inspection data and defect 

15 reliability can be transmitted by communication 

equipment to other inspection apparatus, optical review 
apparatus, SEM type review apparatus or defect 
classification apparatus (there are various different 
apparatus such as apparatus for classifying defect 

20 features into defect categories, and apparatus used in a 
neural network) or to external storage means such as a 
server. Of course, only the defect reliability may be 
displayed, printed out or supplied to other means. 

The image input unit 23 is used to input two 

25 images being compared. These images are supplied to the 
scatter diagram generator 24, which then produces a 
scatter diagram. Fig. 39 shows how to generate the 
scatter diagram. The ordinate and abscissa in the 
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scatter diagram indicate the two images f(x, y) , g(x, y) 
being compared, respectively. The scatter diagram may 
show the local contrast of brightness or local average 
or a combination thereof on the ordinate and abscissa 
5 except the brightness of image signals of patterns being 
inspected. The generated scatter diagram is displayed 
with the frequency converted into gradation values as 
illustrated in Fig. 39. Here, the frequency of 0 is 
indicated by gray, low frequency by white, and high 

10 frequency by black. Of course, the scatter diagram may 
illustrate only the presence or absence of data. 

The calculator 26 calculates the frequency on 
the scatter diagram, function of position or relative 
distance on the scatter diagram or information referring 

15 to a look-up table from the above scatter diagram of 

image signals. The calculated information is added to 
the inconsistency information as defect reliability or 
as a scale for the inconsistency corresponding to a 
defect, and stored in the memory 17. 

20 Here, a high frequency in the scatter diagram 

indicates that the corresponding point is unlike defect. 
For example, the pixel corresponding to the black data 
on the scatter diagram in Fig. 39 has a high frequency, 
and hence it seems a normal portion with a high 

25 probability. The pixel corresponding to white data has 
a low frequency and only a fraction of brightness, and 
hence it is a defect with a high probability. Thus, the 
frequency information is an important parameter for 
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indicating the certainty of defect. Similarly, if the 
brightness values of two images being compared are 
equal, those points are distributed on a straight line 
having a gradient of 45 degrees on the scatter diagram. 
5 Therefore, the absolute positions on the scatter diagram 
are also an important parameter for indicating the 
certainty of defect. The pixels corresponding to data 
deviating out of the straight line having a gradient of 
45 degrees (not shown) have low frequencies, and thus 

10 they can be considered most probably as defects. 

Figs- 40A and 40B show straight lines 
estimated by the method of weighted least squares using 
complex pixels in the area around each aimed point. The 
relative distances of two images being compared are the 

15 distances from the straight lines. 

As illustrated in Fig. 40A, an approximate 
straight line is estimated relative to the data within 
an area set around each pixel on the scatter diagram. 
Alternatively, a straight line of weighted least squares 

20 of two images being compared is estimated by using the 
fact that the frequency is a parameter for indicating 
the certainty of defect, or by using complex pixels in 
an area set around each point where the frequency is a 
constant or above. The size of the area is locally 

25 changed according to the frequency in the scatter 

diagram. It is flexible and desired to produce the area 
size by inputting the frequency and referring to the 
look-up table. 
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The distance from the approximate straight 
line is plotted as in Fig. 40B, and this distance is 
regarded as the certainty of defect, and fed to the 
outside or displayed. The smaller the distance, the 
5 more probably the image can be decided to be normal. 
The larger the distance, the closer the image is to a 
defect . 

From Fig. 40B, it will be seen that the 
frequency becomes small as the distance from the 

10 approximate straight line increases, thus indicating 
that the certainty of defect increases. The points 
where the frequency is a constant or above, for example 
1 or below are considered as having a high degree of 
certainty of defect, and thus removed from the region of 

15 the approximate straight line. The local gradation 
converter 12 in Fig, 38 may estimate an approximate 
straight line for each pixel by the method shown in 
Figs. 40A and 40B and make gradation conversion on the 
basis of the straight lines. 

20 Moreover, the scattering of all image from the 

straight line can be computed by the equations (42) and 
(43) used in the third embodiment. 

This information can be used as a scale of the 
degree of coincidence in all image. 

25 Thus, the certainty of inconsistency 

information produced from the inspection apparatus can 
be decided by use of the information obtained from the 
scatter diagram. 
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The display 25 displays the generated scatter 
diagram alone or with other information. The input 
means 15 is used to input thresholds, for example, a 
threshold for the binary conversion of the absolute 
value of a difference image, and the line segment of the 
inputted threshold is plotted on the scatter diagram. 
By referring to this scatter diagram, the input 
threshold can be easily decided to be appropriate or 
not . 

In addition, by referring to the information 
of the displayed diagram, it is possible to determine a 
threshold suitable for the image. In other words, if 
the threshold is determined according to the above-given 
certainty of defect, defects can be detected with higher 
reliability. For example, a threshold is determined 
adequately for each pixel, or according to the frequency 
in the scatter diagram. The conversion between the 
frequency and the threshold is performed by using the 
look-up table (LUT) as illustrated in Fig. 8. The 
contents of the look-up table, or the way to convert is 
previously determined before the inspection. 

As illustrated in Fig. 38, the images used in 
the scatter diagram, which are two images being 
compared, for example, images of pixel units after 
alignment, can be supplied to the image input unit 23 at 
each step of the image processing. 

Fig. 42 shows an example of the process for 
the two images based on the system illustrated in Fig. 
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38. The processed portion is the inspected pattern that 
has been flattened by CMP (chemical mechanical) . The 
line and space pattern (pattern of a large number of 
lines arranged with a constant spacing) is detected at 
5 the lower right of the image. The upper left region has 
no pattern. a histogram of images is also shown in the 
course of each process. From the histograms, it will be 
seen that at the first stage the brightness values of 
two images are not coincident. First, the correlation 

10 values of the images are estimated by normalization 

correlation, the position where the correlation value is 
high is determined, and alignment of images is performed 
with an accuracy of pixel units. Then, the two aligned 
images are subjected to local gradation conversion, or 

15 local brightness correction. 

Figs. 43A and 43B illustrate scatter diagrams 
of images. The two images are not coincident in 
brightness at the stage of alignment with an accuracy of 
pixel units, and thus become scattering out of a 

20 straight line having a gradient of 45 degrees in the 
scatter diagram. However, after the local gradation 
conversion process (system based on the equations 
(34)-(37)) according to the invention, the scatter 
diagram has a distribution near the straight line. 

25 Thus, it will be understood that there is an effect in 
making the brightness values of two images equal. The 
gradient and interception are those of a line segment 
fitted to the data of the scatter diagram. 
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According to the invention, the gradient as a 
scale of degree of coincidence between two images is 
0.705 at fast and changed to 0.986 after the local 
gradation conversion, or local brightness correction. 
5 Thus, the degree of coincidence between brightness 

values is increased. The above-mentioned Ve indicating 
the degree of coincidence between two images is 40.02 at 
first and changed to 8.598 after the local gradation 
conversion, or local brightness correction. The degree 

10 of coincidence between brightness values is improved. 

Although these values are calculated for all 
images of image units being compared, the above Ve may 
be estimated for each local size being converted in 
gradation in the system shown in Fig. 40. 

15 In the examples shown in Figs. 43A and 43B, 

information of certainty of defect is added to the 
inconsistency by using the scatter diagram after the 
local brightness correction, and according to the above 
procedure. The pixels distributed around in the scatter 

20 diagram have a high degree of certainty of defect. The 
threshold can be established by using straight lines 
having a gradient of 4 5 degrees to put the distributed 
data therebetween. Of course, even at the stage where 
images are aligned with an accuracy of pixel units, 

25 information of certainty of defect can be similarly 

extracted from the scatter diagram. However, since the 
threshold is determined to hold the distributed data 
therebetween, it cannot be estimated with high 
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sensitivity. 

Therefore, for determining a threshold it is 
more desirable to use a scatter diagram generated after 
the local brightness correction. 
5 If the generation or display of the scatter 

diagram or the calculation of thresholds using data of 
the scatter diagram is performed for each image or each 
pixel of an image in synchronism with the image 
detection, the inspection can be made with high 

10 sensitivity- While the image processing is of the 

pipeline type as described above, another type of image 
processing can be used. 

Figs. 44A-'44C show lists of defect output. 
The values listed are inconsistency outputs resulting 

15 from comparing the gradation-converted images by the 

comparator 14. The lists include the values of defect 
reliability in addition to the values indicating the 
defect number and the features of defect such as 
coordinates, length and area. Here, the defect number 

20 indicates the order in which the chips being inspected 
were scanned. The defect coordinates indicate the 
position at which a defect of a chip being inspected was 
detected in a coordinate system with, for example, an 
alignment mark or origin provided as a reference. The 

25 defect lengths are the lengths along the X-axis and Y- 
axis, respectively. Of course, the lengths along the 
major axis and minor axis may be calculated. 

These units are, for example, microns 



depending on a necessary precision • The defect 
reliability is the information obtained from the above- 
mentioned scatter diagram. For example, the defect 
reliability is expressed by the frequency and distance 
from the approximate straight line on the scatter 
diagram of pixels of a defective image. 

Fig. 44A is based on the frequency of a 
defective image in the scatter diagram. The lower the 
frequency, the higher the defect reliability value. 
Fig. 44B is based on the distance from the approximate 
straight line of a defective image in the scatter 
diagram- The longer the distance, the higher the defect 
reliability value. Fig. 44C is based on the position of 
a defective image in the scatter diagram. The 
reliability value of the defect is increased as the 
defect is separated more away from the straight line 
with a gradient of 45 degrees. Of course, the defect 
reliability value may have a plurality of factors such 
as the frequency of a pixel of a defective image and the 
distance thereof from the approximate straight line on 
the scatter diagram. If the defect covers a plurality 
of pixels, the amount of statistic is calculated, such 
as the average, maximum or median of the frequencies of 
the pixels. Thus, the inconsistency information with 
the reliability added can be used for the calculation of 
fatality of defect. 

The fatality of defect is the fatality of 
defect to the inspected pattern, depending on, for 
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example, the size of defect and the coordinates (region) 
in which the defect exists. The smaller the pattern 
size, the higher the fatality of the defect of the same 
size. If this fatality is used with the reliability, 
5 the fatality can be decided with high precision. As a 
result, the defects of the inspected pattern can be more 
accurately diagnosed by the processes. 

A supplementary explanation will be made of 
the size of image. The size of image, or the unit of 

10 alignment (matching) of images can be determined by the 
following method. The amount of shift between two 
images being compared is estimated in units of fine 
divisions, as illustrated in Fig. 45. The amount of 
shift is, as illustrated, detected separately in the X- 

15 direction and Y-direction. This shift data can be 

spectrum-analyzed as shown by the waveform in Fig. 46. 
In this spectrum-analyzed diagram, the ordinate 
indicates the spectrum density, and the abscissa the 
frequency. 

20 In this figure, we consider the highest 

frequency with high density, or 0.011. This frequency 
is determined by, for example, apparatus characteristic 
or vibration characteristic such as the travelling 
characteristic of the stage. The results of the 

25 spectrum analysis indicate that the shift between two 
images repeats at this frequency. It is now assumed 
that the reciprocal of this frequency value, or 88 lines 
is a unit of image, or a unit of matching. If a large 
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peak-to-peak value of shift appears within an image, it 
is difficult to match both images with high precision. 
If the unit of image is assumed to be 1/4 of the 
reciprocal of this frequency, the amount of shift can be 
reduced to 1/2 of the peak shift or below. In addition, 
the unit of image is made 1/8 the reciprocal of the 
frequency, the amount of shift can be reduced to 1/4 the 
peak shift or below. 

Thus, as the image unit is decreased to a 
finer value, the precision of matching between the 
images should be able to be increased the more. 
However, the pattern information to be included within 
the image is decreased, and as a result the image 
matching precision does not increase. Therefore, from 
the results of the spectrum analysis the upper limit of 
the image size can be determined by the necessary 
matching precision, and from the standpoint of assuring 
the pattern information the lower limit thereof can be 
decided by the pattern space information (information of 
the region with no pattern formed) depending on the 
patterns being compared. While the highest frequency is 
considered in the above description, the amount of shift 
and the frequency corresponding to a large amount of 
shift may be considered, and in this case effective 
results can be obtained. 

The above process may be made separately for 
the X-direction and Y-direction or only for the stage 
movement direction as in the case of an accumulation 
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type linear image sensor. 

The size of image at the step of gradation 
conversion may be made equal to the above-given image 
size in the system based on the equations (34) and (37) 
5 or may be determined locally as in the system mentioned 
with reference to Fig. 40. 

According to the embodiments of the invention, 
the defects can be detected with high sensitivity 
without being affected by the change of pattern 
10 brightness at different places. In addition, the 

pattern with the brightness greatly scattering in a dark 
region such as memory mats 21 can be inspected with high 
sensitivity. The same effect can be expected not only 
for the memory elements but for the logic elements in 
15 the microcomputer or ASIC. Therefore, high-reliability 
inspection can be performed as compared with the prior 
art . 

While bright field illumination is employed in 
the above embodiments, microscope illumination such as 

20 dark field illumination or ring band illumination may be 
used. The illumination used does not depend on the 
illumination length. In addition, the inspection may 
naturally use a secondary electron image on the sample 
surface that can be obtained by detecting the secondary 

25 electrons emitted from the sample when an electron beam 
is irradiated on the sample. Moreover, the inspection 
may be made a plurality of times with the kind of 
illumination or the conditions of illumination changed. 
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the results of the inspection being logically summed for 
the final result. Alternatively, the logical product 
thereof is used to accurately detect defects. For 
example, the image defect may be diagnosed by the defect 
5 distribution and number. Moreover, the detector is not 
limited to the linear image sensor, but may be a TV 
camera by which the pattern image is detected. The 
kinds of defect may be a defective shape of short- 
circuit or open-circuit or other foreign bodies. 

10 According to the above embodiments, more 

effective analyzing processes can be used. 

By employing inspection data with reliability 
added, it is possible to execute review of defects more 
effectively. For example, in the defect lists shown in 

15 Figs. 44A-'44C, the order of defects is changed (sorting) 
according to the reliability of defects. For example, 
defects are rearranged in the order of higher certainty 
of defect. By this arrangement, review of defects and 
confirmation can be performed in the order of high 

20 reliability. It is possible not only to completely 

prevent the misdetection by the inspection apparatus, 
but to select the inconsistency on the boundary between 
the defect and the normal state. If the defect 
rearrangement is made according not only to the 

25 'reliability but to the information of coordinates and 
size of the defects, more effective defect review and 
confirmation can be performed. 

In other words, the decision of fatality can 



be accurately executed by the addition of reliability, 
and use of this fatality enables effective defect review 
and confirmation with higher precision. A threshold may 
be provided for the reliability or fatality so that only 
the defects higher than the threshold can be reviewed. 
Moreover, the same effect can be expected for the 
classification of defects. In addition, yield diagnosis 
and prediction can be made without problem by use of 
only the true defects. Thus, it is possible to reduce 
the load of the visually reviewing operation for the 
inconsistency, and increase the reliability of the yield 
prediction . 

While the above embodiments of the invention 
mentioned above employ the comparative inspection method 
chiefly using an optical microscope, other scan type 
electron microscopes or other detectors using infrared 
light or X-rays may be used with the same effect. In 
addition, while the above embodiments employ the method 
based on the comparison between images, the reliability 
of defects added to the defect information can be 
applied to the apparatus of such type as foreign body 
inspection apparatus in which scattered light detects a 
large area of body without use of comparison. 

According to the embodiments 1-^4 of the 
invention, defects can be detected with high sensitivity 
without being affected by the brightness change of 
pattern at each position. The pattern of which the 
brightness greatly scatters in the dark region such as 
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memory mats 21 can be inspected with high sensitivity. 
Also, high-precision image matching can be performed 
without being affected by the vibration characteristic 
of equipment. Therefore, as compared with the prior 
5 art, the inspection can be made with high reliability. 

The contents of the specifications and 
drawings of Japanese Patent Application Nos. 110383/1998 
and 264275/1998 that are the basic applications for the 
priority of this application are incorporated in those 
10 of this application by this reference. 



